Attributes in R Markdown are executed differently depending on which type of output you are producing.
We will focus on HTML, things are similar but different for pdf. Below are examples from our text and other sources (see reference list below).
The YAML is the header providing details about the document. The
yaml header is delineated by three dashes before and after.
The trick to the yaml is knowing what tags to add.
To see what options are available in the YAML for an html document type the following code in the console.
?rmarkdown::html_document
#Side-by-Side Figures
Sometimes it is nice to print plots or graphs side by side. In the
console we have done this using the par function. In
rmarkdown this can be done with the code chunk options. Setting
fig.show to "hold" and out.width
to 50% will result in side-by-side graphs. Figures in this case is
referring to r generated graphs/plots not images.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
plot <- ggplot(mtcars, aes(x = cyl, y = mpg)) +
geom_point()
# left image
plot
# right image
plot + geom_line()
Printing a large dataset to an r markdown file can render alot of
unnecessary scrolling. Using the df_print: paged option in
the YAML header will help.
data("PimaIndiansDiabetes2", package = "mlbench")
# remove missing values and create temporary dataframe
PID <- na.omit(PimaIndiansDiabetes2)
PID
While visualizing data with graphs is usually the preferred method, sometimes data must be presented in tabular form. For static tables kable from the knitr package prints nice looking tables that are adapted to the type of output document.
knitr::kable(head(PID), caption='Tabular data printed using kable.')
| pregnant | glucose | pressure | triceps | insulin | mass | pedigree | age | diabetes | |
|---|---|---|---|---|---|---|---|---|---|
| 4 | 1 | 89 | 66 | 23 | 94 | 28.1 | 0.167 | 21 | neg |
| 5 | 0 | 137 | 40 | 35 | 168 | 43.1 | 2.288 | 33 | pos |
| 7 | 3 | 78 | 50 | 32 | 88 | 31.0 | 0.248 | 26 | pos |
| 9 | 2 | 197 | 70 | 45 | 543 | 30.5 | 0.158 | 53 | pos |
| 14 | 1 | 189 | 60 | 23 | 846 | 30.1 | 0.398 | 59 | pos |
| 15 | 5 | 166 | 72 | 19 | 175 | 25.8 | 0.587 | 51 | pos |
If I have output from code that I would like to use in my explanation
or text, you can do so by utilizing the inline code feature. to do this
you add r to a line of text.
For example..
Y <- sqrt(3425)
Y
## [1] 58.5235
If I want to use the value of Y in my text I can do the following:
The value of Y is 58.5234996.
To color text in html format use <span>.
Any text I write within span with the style=“color: purple;” option will turn the text purple.
You can write the color in words (which can be seen here).
You can organize sections into tab form by adding
.tabset to a section header. To end the tabs start a new
section header.
The text here will appear in tab 1
The text here will appear in tab 2
you can put text and plots in tabs
plot
To add a theme to your R Markdown file add theme: to
your YAML.
For a list of possible themes visit: https://bootswatch.com
Notice: some themes can change your desired output
If you have location data it may be helpful to use R to create a map of your data. Below is a simple example of creating a map. There are other ways to create more detailed static maps. We will look at how to create interactive maps later.
library(maps)
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
library(mapdata)
#Create a basic map of US
map('state')
#add title to your map
title('Map of the United States')
Plot using base map and customizable colors.
map('state', col = "darkgray",
fill = TRUE,
border = "white")
# add a title to your map
title('Map of the United States')
Create a map of South Carolina with county boundaries. Notice you can create multiple-line titles using .
map('county', regions = "South Carolina", col = "darkgray", fill = TRUE, border = "grey80")
map('state', regions = "South Carolina", col = "black", add = TRUE)
# add the x, y location of the Clemson using the points (these x, y locations are the DD coordinates of latitude and longitude)
# two colors and sized are used to make the symbol look a little brighter
points(x = -82.83737, y = 34.68344, pch = 21, col = "slateblue2", cex = 2)
points(x = -82.83737, y = 34.68344, pch = 8, col = "orangered", cex = 1.3)
# add a title to your map
title('County Map of South Carolina\nClemson location')
You can also stack several map layers using
add=TRUE.
map('state', fill = TRUE, col = "darkgray", border = "white", lwd = 1)
map(database = "usa", lwd = 1, add = TRUE)
# add the adjacent parts of the US; can't forget my homeland
map("state", "south carolina", col = "orangered",
lwd = 1, fill = TRUE, add = TRUE)
# add Clemson location
title("Clemson\nSouth Carolina")
# add the x, y location of Clemson using the points
points(x = -82.83737, y = 34.68344, pch = 8, col = "slateblue2", cex = 1.3)
To see the different colors for r graphs visit here
The htmlwidgets package enables the simple creation of R
packages that provide R bindings for arbitrary JavaScript libraries.
This provides R users access to a wide array of useful JavaScript
libraries for visualizing data, all within R and without having to learn
JavaScript.
The DT package provides an interactive tabular experience through the DataTables JavaScript library. Since DT is based on htmlwidgets, its full interactivity is only experienced in HTML-based output.
library(DT)
data(diamonds, package='ggplot2')
datatable(head(diamonds, 100))
The DataTable library has many extensions, plugins and options, most of which are implemented by the DT package. To make our table look nicer we turn off rownames; make each column searchable with the filter argument; enable the Scroller extension for better vertical scrolling; allow horizontal scrolling with scrollX; and set the displayed dom elements to be the table itself (t), table information (i) and the Scroller capability (S). Some of these are listed as arguments to the datatable function, and others are specified in a list provided to the options argument. Deciphering what argument goes in which part of the function unfortunately requires scouring the DT documentation and vignettes and the DataTables documentation.
datatable(head(diamonds, 100),
rownames=FALSE,
extensions='Scroller', filter='top',
options = list(dom = "tiS", scrollX=TRUE,
scrollY = 400,
scrollCollapse = TRUE)
)
A datatables object can be passed, via a pipe, to formatting functions to customize the output. The following code builds a datatables object, formats the price column as currency rounded to the nearest whole number and color codes the rows depending on the value of the cut column.
datatable(head(diamonds, 100),
rownames=FALSE,
extensions='Scroller', filter='top',
options = list(dom = "tiS", scrollX=TRUE,
scrollY = 400,
scrollCollapse = TRUE
)
) %>%
formatCurrency('price', digits=0) %>%
formatStyle(columns='cut',
valueColumns='cut',
target='row',
backgroundColor=styleEqual(levels=c('Good', 'Ideal'),values=c('red', 'green'))
)
Map capabilities can be extended to interactive maps using the leaflet package. This package creates maps based on the OpenStreetMap (or other map provider) that are scrollable and zoomable. It can also use shapefiles, GeoJSON, TopoJSON and raster images to build up the map. To see this in action we plot a list of favorite pizza places on a map used by our textbook.
First we read the JSON file holding the list of favorite pizza places.
library(jsonlite)
##
## Attaching package: 'jsonlite'
## The following object is masked from 'package:purrr':
##
## flatten
pizza <- fromJSON('http://www.jaredlander.com/data/PizzaFavorites.json')
pizza
class(pizza$Details)
## [1] "list"
class(pizza$Details[[1]])
## [1] "data.frame"
dim(pizza$Details[[1]])
## [1] 1 4
We see that the Details column is a list-column where each element is a data.frame with four columns. We want to un-nest this structure so that pizza is a data.frame where each row has a column for every column in the nested data.frames. In order to get longitude and latitude coordinates for the pizza places we need to create a character column that is the combination of all the address columns.
library(dplyr)
library(tidyr)
pizza2 <- pizza %>% unnest(cols=c(Details))
pizza2
pizza <- pizza %>% unnest(cols=c(Details)) %>%
#rename the Address column Street
rename(Street=Address) %>%
#create a new column to hold entire address
unite(col=Address, Street, City, State, Zip,
sep=', ', remove=FALSE)
pizza
The tidygeocoder package provides the geocode function
to geocode addresses. We use geocode to create columns for latitude and
longitude.
library(tidygeocoder)
pizza <- pizza %>% geocode(Address)
## Passing 9 addresses to the Nominatim single address geocoder
## Query completed in: 9.3 seconds
pizza
Now that we have data with coordinates we can build a map with
markers showing our points of interest. The leaflet
function initializes the map. Running just that renders a blank map.
Passing that object, via pipe, into addTiles draws a map,
based on OpenStreetMap tiles, at minimum zoom and centered on the Prime
Meridian since we did not provide any data. Passing that to the
addMarkers function adds markers at the specified ‘long’
and ‘lat’ of our favorite pizza places. The columns holding the
information are specified using the formula interface. Clicking on the
markers reveals a popup displaying the name and street address of a
pizza place. In an HTML-based document this map can be zoomed and
dragged just like any other interactive map
library(leaflet)
leaflet() %>% addTiles() %>%
addMarkers(lng=~long, lat=~lat,
popup=~sprintf('%s<br/>%s', Name, Street),
data=pizza
)
Plotting time series can be done with ggplot2,
quantmod and many other packages, but dygraphs
creates interactive plots. To illustrate, we look at the GDP data from
the World Bank. We use the WDI package to access data
through the World Bank’s API.
library(WDI)
gdp <- WDI(country=c("US", "CA", "SG", "IL"),
indicator=c("NY.GDP.PCAP.CD"),
start=1970, end=2021)
names(gdp) <- c("iso2c", "Country", "PerCapGDP", "Year")
head(gdp, 15)
This gives us GDP data in the long format. We convert it to wide format using spread from the tidyr package.
gdpWide <- gdp %>%
dplyr::select(Country, Year, PerCapGDP) %>%
tidyr::spread(key=Country, value=PerCapGDP)
head(gdpWide)
With the time element in the first column and each time series represented as a single column, we use dygraphs to make an interactive JavaScript plot.
library(dygraphs)
dygraph(gdpWide, main='Yearly Per Capita GDP',
xlab='Year', ylab='Per Capita GDP') %>%
dyOptions(drawPoints = TRUE, pointSize = 1) %>%
dyLegend(width=400)
Hovering over lines of the graph will highlight synchronized points on each line and display the values in the legend. Drawing a rectangle in the graph will zoom into the data. We can add a range selection that can be dragged to show different part of the graph with dyRangeSelector.
dygraph(gdpWide, main='Yearly Per Capita GDP',
xlab='Year', ylab='Per Capita GDP') %>%
dyOptions(drawPoints = TRUE, pointSize = 1) %>%
dyLegend(width=400) %>%
dyRangeSelector(dateWindow=c("1990", "2000"))
The ‘threejs’, by Bryan Lewis, has functions for building 3D scatterplots and globes that can be spun around to view different angles. To see this we draw arcs between origin and destination cities of flights that were in the air in the afternoon of January 2, 2017. The dataset contains the airport codes and coordinates of the airports on both ends of the route.
library(readr)
flights <- read_tsv('http://www.jaredlander.com/data/Flights_Jan_2.tsv')
## Rows: 151 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (2): From, To
## dbl (4): From_Lat, From_Long, To_Lat, To_Long
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(flights, 10)
The dataset is already in proper form to draw arcs between destinations and origins. It is also prepared to plot points for the airports, but airports are in the dataset multiple times, so the plot will simply overlay the points. It will be more useful to have counts for the number of times an airport appears so that we can draw one point with a height determined by the number of flights originating from each airport.
airports <- flights %>% count(From_Lat, From_Long) %>%
arrange(desc(n))
head(airports, 15)
The first argument to globejs is the image to use as a surface map for the globe. The default image is nice, but NASA has a high-resolution “blue marble” image we use.
earth <- "http://eoimages.gsfc.nasa.gov/images/imagerecords/73000/73909/world.topo.bathy.200412.3x5400x2700.jpg"
Now that the data are prepared and we have a nice image for the
surface map, we can draw the globe. The first argument,
img, is the image to use, which we saved to the
earth object. The next two arguments, lat and
long, are the coordinates of points to draw. The
value argument controls how tall to draw the points. The
arcs argument takes a four-column data.frame
where the first two columns are the origin latitude and longitude and
the second two columns are the destination latitude and longitude. The
rest of the arguments customize the look and feel of the globe.
library(threejs)
globejs(img=earth, lat=airports$From_Lat,
long=airports$From_Long,
value=airports$n*5, color='red',
arcs=flights %>%
dplyr::select(From_Lat, From_Long, To_Lat, To_Long),
arcsHeight=.4, arcsLwd=4,
arcsColor="#3e4ca2", arcsOpacity=.85,
atmosphere=TRUE, fov=30, rotationlat=.5,
rotationlong=-.05)